Skip to content

docs: generalized architecture, ADR-001 accepted, ADR-002 Vault AWS Secrets Engine #1

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

arnol377
Copy link
Collaborator

  • Add template repo rendering, Bedrock discussion notes, and HOW-IT-WORKS docs
  • Incorporate morga471 review feedback and address Teams follow-up questions
  • docs: fix outdated sections in HOW-IT-WORKS.md
  • review: address morga471 round-2 feedback
  • fix: source tf-run toolchain from terraform/support, not scripts/
  • feat: Enhance Terraform execution with proposer and executor templates
  • feat: Add initial buildspec.yml for tf-run-executor configuration
  • docs: ADR-001 webhook auto-apply on merge to main (proposed)
  • docs: ADR-001 update — webhook payload details, commit status writeback, .sc-automation.yml lifecycle
  • docs: Update ADR-001 to clarify webhook-triggered auto-apply process
  • docs: generalized architecture, webhook auto-apply ADR, Vault ADR

Dave Arnold added 11 commits May 11, 2026 16:23
…KS docs

- lambda/app.py: add template_repo + template_vars fields to TfRunRequest;
  merge field_validator to cover both extra_files and template_vars; pass
  both new fields in CodeBuild environmentVariablesOverride
- buildspec.yml: add TEMPLATE_REPO/TEMPLATE_VARS env defaults; new build step
  clones template repo, renders .j2 files via Jinja2 StrictUndefined, copies
  non-.j2 files verbatim; EXTRA_FILES step runs after and overrides
- service-catalog/product-template.yaml: add TemplateRepo + TemplateVars
  parameters and parameter group; wire to Lambda Custom Resource
- docs/HOW-IT-WORKS.md: full end-to-end documentation of the system
- .gitignore: exclude *.tfstate, *.tfvars, .terraform/, terraform_data_dirs/
…tions

- buildspec.yml:
  - Clone terraform/support at build time for version governance (no more
    hardcoded 1.9.1/2.49.0 version strings; reads VERSION files from support repo)
  - S3 env vars are now prefixes (TF_BINARY_S3_PREFIX, GH_CLI_S3_PREFIX);
    filenames constructed dynamically from support repo VERSION files
  - Add SSH->HTTPS git URL rewrite so Terraform module SSH sources work via PAT
  - Add conditional cross-account assume-role step (TARGET_ACCOUNT_ID)
  - Add 169.254.170.2 to NO_PROXY (AL2023 ECS credential provider)

- deploy/codebuild.tf: upgrade image to amazonlinux2023-x86_64-standard:4.0

- deploy/variables.tf: rename tf_binary_s3/gh_cli_s3 to *_prefix with updated
  descriptions and defaults

- lambda/app.py: add optional target_account_id field; pass TARGET_ACCOUNT_ID
  to CodeBuild environmentVariablesOverride

- service-catalog/product-template.yaml: add optional TargetAccountId parameter
  with AllowedPattern validation

- docs/HOW-IT-WORKS.md:
  - Document version governance via terraform/support
  - Note AL2023
  - Replace 'No SSH Git' constraint with SSH->HTTPS rewrite explanation
  - Add cross-account section explaining TARGET_ACCOUNT_ID and required role
  - Add 'Moving This System to a Different Account' section (Teams Q)
- Component overview BUILD phase: add missing step 6 (assume cross-account role)
  and renumber steps 7-8 accordingly
- Component overview POST_BUILD: correct description — Lambda calls GHE API
  directly; PR_URL= line in logs is informational only (not parsed by Lambda)
- Step 7 (Commit and Push): add git -c user.email/user.name config that is
  actually present in buildspec.yml
- Step 10 (Lambda Returns Results): note the informational-only nature of
  PR_URL= more clearly
- CFN outputs table: add snake_case aliases (pull_request_url, repository_url,
  branch_name) that Lambda actually emits alongside PascalCase variants
Code changes:
- lambda/app.py: add 'global' to valid region_dir values (line 546 comment)
- service-catalog/product-template.yaml: add 'global' to RegionDir AllowedValues

Doc changes (HOW-IT-WORKS.md):
- region_dir: document 'global' as valid value for non-regional resources (SSO, IAM)
- Step 8: add tf-run plan vs tf-plan distinction; tf-plan skips symlink setup
- Bootstrapping: rewrite section — remote_state.backend.tf is created by the
  REMOTE_STATE directive in tf-run.data; what must pre-exist is remote_state.yml.
  Running 'tf-run init' is the correct first-time setup path.
- git-secret: add note that CodeBuild cannot run 'git secret reveal' (no GPG key)
- Build Timeout: add EventBridge as future improvement over Lambda polling
- SSH section: add note about service-user SSH key as alternative approach
- Why csvd-dev: add note about future move to operations accounts
- Delete: add recommended decommission path (PR-based removal + manual apply)
- Rebuild Lambda: replace personal home path with generic, use tf apply instead
  of manual aws lambda update-function-code CLI call
- Deploy: remove personal ~/aws-creds and /home/a/arnol377 paths
The canonical versions of tf-run, tf-control.sh, and tf-directory-setup.py
live in github.e.it.census.gov/terraform/support (local-app/ subtree).
We already clone that repo during the INSTALL phase for VERSION governance,
so sourcing the scripts from there costs nothing extra.

- buildspec.yml: cp from /tmp/tf-support/local-app/ instead of CODEBUILD_SRC_DIR/scripts/
- scripts/: remove tf-run, tf-control.sh, tf-directory-setup.py (bundled copies)
  tf-run.py and tf-run.data remain (project-specific)
- docs/HOW-IT-WORKS.md: update both references (component overview + Step 3)
- Updated `TfRunRequest` model to differentiate between propose and apply actions, adding relevant fields for each.
- Refactored `start_codebuild_build` function to handle environment variable overrides based on the action type.
- Implemented logic in `lambda_handler` to manage responses for both propose and apply actions.
- Added new CloudFormation templates for the proposer and executor products, enabling structured Terraform change proposals and applications.
- The proposer template handles rendering templates and opening pull requests, while the executor template applies changes after PR approval.
- docs/README.md: high-level index with reading paths by use case
- docs/HOW-IT-WORKS.md: reframe from two-product to single Proposer +
  webhook auto-apply; remove executor SC product framing
- docs/decisions/001-webhook-auto-apply.md: status Proposed → Accepted;
  update context and consequences to reflect removal of executor SC product
- docs/decisions/002-vault-aws-secrets-engine.md: new ADR for Vault AWS
  Secrets Engine; dynamic cross-account credentials; per-product IAM scope
  via Proposer terraform apply; account baseline prerequisite pattern
- docs/generalized-terraform-product-architecture.md: new
- docs/template-management.md: Executor flow, .sc-automation.yml schema
- docs/repo-vars-and-secrets.md: CodeBuild environmentVariablesOverride pattern
- docs/workflow-flowcharts.md: Mermaid diagrams for propose/apply flows
- docs/fleet-governance-at-scale.md: new
- docs/service-catalog-census-integration.md: new
- docs/cross-account-visibility.md: new
Dave Arnold added 2 commits May 20, 2026 12:20
- repo-vars-and-secrets.md: remove 'Later rollout (GHA)' callout block;
  the executor is CodeBuild triggered by webhook, not GitHub Actions
- fleet-governance-at-scale.md: remove 'GHA executor rollout phase' note;
  replace with accurate CodeBuild+webhook description
Adds a dedicated section clearly describing what each CodeBuild project does,
how it is triggered, what env vars it receives, and what it does/does not do —
so stakeholders have a single place to understand the two-build model.
@arnol377 arnol377 changed the title feature/template repo rendering docs: generalized architecture, ADR-001 accepted, ADR-002 Vault AWS Secrets Engine May 20, 2026
Dave Arnold added 14 commits May 20, 2026 13:32
…tory-setup.py)

- buildspec-proposer.yml: install tf-directory-setup.py from terraform/support in
  INSTALL phase; add python-dateutil + pyyaml pip deps.
  BUILD phase: after template rendering, run Python bootstrap step that:
    1. Processes REMOTE-STATE directives in tf-run.data files — derives workspace
       remote_state.yml from layer-level file (identical to tf-run.sh behavior)
    2. Runs tf-directory-setup.py --link none in each workspace with remote_state.yml —
       generates remote_state.backend.tf + .tf.s3/.local/.none variant files + symlink

- buildspec-executor.yml: add note that REMOTE-STATE and tf-directory-setup.py
  steps are idempotent — files already exist from Proposer PR, no new files created

- docs/HOW-IT-WORKS.md: expand BUILD phase step 5 to document the full file
  generation sequence including REMOTE-STATE and tf-directory-setup.py; add
  rationale explaining why all generation must happen in the Proposer

- docs/template-management.md: fix template repo structure diagram — workspace
  remote_state.yml.j2 files removed (wrong); layer-level remote_state.yml.j2 shown;
  workspace tf-run.data with REMOTE-STATE directive shown; add layout rules for
  auto.tfvars profile/region requirement and .j2 source file handling.
  Expand Proposer Build steps to cover REMOTE-STATE + tf-directory-setup.py.
  Add principle callout: PR diff is the complete truth.
…; plan gitignore

buildspec-executor.yml:
- INSTALL: create terraform_latest symlink -> terraform (account repos use TFCOMMAND=terraform_latest)
- INSTALL: mkdir /data/terraform/terraform.d/plugin-cache + providers (required by .tf-control.tfrc)
- BUILD: after tf-run apply, git add symlink re-link + .terraform.lock.hcl and push directly
  to main with [skip ci] to prevent webhook re-trigger
- Add CodeBuild cache block for /data/terraform/terraform.d/plugin-cache (persists
  provider archives across builds via S3)
- Add log note: logs/ is ephemeral, must be in .gitignore

docs/HOW-IT-WORKS.md:
- INSTALL phase: document terraform_latest alias and /data/terraform dir creation
- BUILD phase step 5: document symlink re-link + lock file commit-back with rationale

docs/template-management.md:
- Template structure: add .gitignore and .terraform.lock.hcl to workspace dirs
- Layout rules: add .gitignore required entries (logs/, .terraform/, tfstate*)
- Layout rules: explain .terraform.lock.hcl lifecycle (committed, Executor updates + pushes back)
- Layout rules: explain terraform_latest alias and plugin cache/.tf-control.tfrc behavior
Core principle: account repos already carry .tf-control, .tf-control.tfrc,
region.tf, credentials.d/, variables.d/ from initial setup. Template repos
provide only the workload-specific delta (new .tf.j2 files + tf-run.data).

Changes:
- Rewrite template-management.md opening to explain delta-overlay model
  and why duplicating standard files would break reusability
- Minimal real example: template-s3-bucket is 3 files total
- New-layer case: layer-level remote_state.yml provided via EXTRA_FILES
  (Lambda Pydantic model builds it from SC form inputs), not from template
- Remove .tf-control, .tf-control.tfrc, region.tf, credentials.d/,
  variables.d/ from template structure diagram (wrong/environment-specific)
- Remove outdated Lambda template organization section (old EKS-only model)
- Replace stale Executor section (was: renders templates + opens PRs)
  with correct model (runs tf-run apply only, commits lock+symlink back)
- Fix Adding checklist: delta files only + EXTRA_FILES note for new layers
Template repos no longer encode layer/workspace as directory nesting.
LAYER and REGION_DIR are already known env vars - the Proposer uses them
to determine the destination path in the account repo.

buildspec-proposer.yml:
- Add TEMPLATE_SOURCE_PATH env var (selects subdirectory variant within repo)
- Rewrite template rendering: dst_root = LAYER/REGION_DIR/ instead of '.'
- Dotfiles at template root (e.g. .sc-automation.yml) go to account repo root
- Document flat layout convention in comments

docs/template-management.md:
- Rewrite What Belongs section: flat structure, show where files land
- template-s3-bucket example is now 3 flat files (not nested infrastructure/west/)
- TEMPLATE_SOURCE_PATH explained inline with multi-variant example
- Remove old Subdirectory Templates section (replaced with inline example)
- tf-run.data, .sc-automation.yml.j2, .terraform.lock.hcl notes updated

docs/HOW-IT-WORKS.md:
- BUILD phase step 3: document flat layout + dotfile root exception
…emplate model

- Template repos are flat: just .tf.j2 + tf-run.data, no nested layer/region dirs
- LAYER and REGION_DIR are Proposer env vars; files are written to the correct
  path at copy time, not encoded in template directory structure
- Remove lambda/templates/{product_type}/ tree (templates live in the template repo)
- Layer-level remote_state.yml built by Lambda Pydantic model extra_files()
  from validated SC form inputs, not stored in template repo
- Pydantic model example updated with account_alias field + extra_files() method
- Onboarding checklist updated: no skeleton clone, no lambda/templates/ step
… it could either support or be changed to work better with this project
…s index; create ADR-003 for Vault cluster topology; create ADR-004 for account baseline IAM role; create ADR-005 for Service Catalog portfolio sharing strategy
… flow

- buildspec-executor.yml / buildspec.yml: default CROSS_ACCOUNT_ROLE=r-inf-terraform;
  replace hardcoded role name with ${CROSS_ACCOUNT_ROLE} in sts:AssumeRole block
  (interim scaffolding — will be replaced by vault read in CSC-1345)
- deploy/codebuild.tf: add CROSS_ACCOUNT_ROLE env var to executor project
- deploy/iam.tf: StsAssumeRoleCrossAccount allows r-inf-terraform, r-inf-terraform-eks,
  sc-automation-codebuild-role (backwards compat)
- lambda/app.py: add TfRunRequest.cross_account_role field (default: r-inf-terraform);
  pass CROSS_ACCOUNT_ROLE in CodeBuild env overrides for apply action
- docs/decisions/001-webhook-auto-apply.md: add cross_account_role to schema table
- design-docs/CHECKPOINT.md: update with Vault pivot and CSC-1344 blocked status

Jira: CSC-1344 (Blocked on CSC-1345)
Internal deck covering problem statement, architecture, security
benefits (NIST 800-53), government/compliance considerations (BSL 1.1,
OpenBao, FIPS 140-2), phased roadmap, and call to action.

Jira: CSC-1345 CSC-1346
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant